SAS CORE CODE

most useful SAS code here.

Notes here

don’t forget to create a new library before starting your SAS program or it will disappear after you close SAS.
Whenever you make some notes, use “* … ;”

Import files

To import CSV file: general version

PROC IMPORT DATAFILE = 'path\xx.csv' OUT=LIB.xxx ;
DBMS=CSV;
REPLACE;
RUN;

If for your dataset, some variables missing at the beginning, then use guessingrows:

proc import file='path\patents.csv' 
guessingrows=3000000;  *approximate number;
out=lib.name dbms=csv;
run;

merge tables

Alt text

inner join

proc sql; 
create table lib.name as 
select a.xxx, a.xxx as xxx, b.* from lib1.name as a, lib2.name as b where      
input(a.PatentNo,7.)=b.patentno;
quit;

Above case show when variable Patentno is a string while b.patentno is a number, convert a.Patentno from string to number. (the length of patentno is 7)

left join

proc sql; 
    create table lib.name as 
select a.*,  b.xxx, b.xx from lib1.xxx as a
left join lib2.xxx as b 
on a.xxx=b.xxx;
quit;

notes

to merge table, proc sql is more flexible than merge

copy dataset

Data lib.newname;
set lib.name;
run;

convert some variable from string to number

data lib.newname;
set lib.newname(rename=(var=var1));
var = put(var1, 7.);
drop var1;
run;

drop unuseful info

drop variable

Data lib.newname(drop=xx xx);
set lib.name;
run;

or:

data lib.newname;
set lib.name;
drop xxx;
run;

keep variable you want

Data lib.name(keep=xxx);
set lib.name;
run;

drop duplicates

proc sort data= lib.name nodupkey
out= lib.newname;
by xxx(variable);
run;

drop missing data

data lib.name;
set lib.name;
if cmiss(xxx) then delete;
run;

Data lib.name;
Set lib.name;
if xxx=" " then delete;
run;

sort data

PROC SORT data=linkaf10.assign;
    BY PatentNo;
RUN;

deal with dates in SAS

keep obs after 2010

Data lib.name;
set lib.name;
year= year(xxx);
run;

Data lib.name;
set lib.name;
if year > 2010 then delete;
run;

standardize variables

change to uppercase

data lib.name;
set lib.name;
newvar = upcase(var);
run;

delete all the punctuation characters

data lib.name;
set lib.name;
var1=compress(var,    ,'pd') 
run;
data lib.name;
set lib.name;
rename var1 var;
run;

create distinct identification(number)

data lib.name;
set lib.name;
retain var1 2000; *start from 2000;
var+1;
run;

export file

proc export data=lib.xxx
outfile= "path\xxx.csv" 
dbms=csv replace;
run;